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Abstract 

The abundance of online user data has led to a surge of interests in 
understanding the dynamics of social relationships using computational 
methods. Utilizing users’ items adoption data, we develop a new method 
to compute the Granger-causal (GC) relationships among users. In order 
to handle the high dimensional and sparse nature of the adoption data, 
we propose to model the relationships among users in latent space instead 
of the original data space. We devise a Linear Dynamical Topic Model 
(LDTM) that can capture the dynamics of the users’ items adoption be¬ 
haviors in latent (topic) space. Using the time series of temporal topic 
distributions learned by LDTM, we conduct Granger causality tests to 
measure the social correlation relationships between pairs of users. We call 
the combination of our LDTM and Granger causality tests as Temporal 
Social Correlation. By conducting extensive experiments on bibliographic 
data, where authors are analogous to users, we show that the ordering of 
authors’ name on their publications plays a statistically significant role 
in the interaction of research topics among the authors. We also present 
a case study to illustrate the correlational relationships between pairs of 
authors. 


1 Introduction 

Rapid advances in social media and internet technologies have led to the gen¬ 
eration of massive user data in digital forms. This gives rise to an important 
question: How do users relate to and socially influence one another? Social in¬ 
fluence is the mechanism of a user modifying her behavior or attributes so as to 
be more similar to her other socially connected users. For many decades, social 
scientists recognize the importance of social influence contributing to homophily 
in social networks, and have embarked on research that determine and measure 
the effect of social influence on homophily [32, 23]. Measuring social influence 
has many practical applications; for instance, it provides an effective means to 


1 


target influential individuals for product marketing, or to identify pivotal peo¬ 
ple in an organization for optimizing corporate management as well as driving 
innovations. 

In this paper, we define social influence from a user i to another user j as 
“the actions of i causes j to perform a set of actions in the future”. Social 
influence has been previously studied by various researchers [20, 39, 55, 15]. 
However, their approaches do not take into account the temporal aspects of so¬ 
cial influence. Instead of analyzing users’ past and future actions, they consider 
user actions independent of their timestamps. 

Many existing works also fail to account for the causality aspect of social 
influence. Knowing how user i’s past actions can predict user j’s future actions 
better than j’s past actions is only a necessary condition and not sufficient 
for finding social influence. Since the definition of social influence reflects the 
widely discussed notion of causality [27, 43], the sufficient condition for finding 
social influence requires us to exclude other external factors that could affect 
the actions of j. That is, we need to eliminate the confounding variables that 
give doubt to the predictive power of i's and j’s past on j’s future [28]. 

It is generally difficult, however, to satisfy this sufficient condition, due to 
the absence of complete user data capturing all external factors that influence 
the users’ actions. There is also a need to conduct randomized controlled exper¬ 
iments [29, 47], which is very challenging in practice. Given these difficulties, 
we relax our assumptions and use a simplijied notion of social influence, known 
as Temporal Social Correlation (TSC), whereby we ignore the presence of con¬ 
founding variables, and assume that users who are socially correlated tend to 
make similar choices over time. 

1.1 Problem Formulation 

We apply the aforementioned notion of social influence for the analysis of users’ 
items adoption behavior. We use the term “users adopting items” to refer to 
any action of a user on items reflecting her preferences. The concept of users 
adopting items can be applied in various contexts, e.g., users watching movies, 
users joining online communities [16, 18], or users producing words [17]. 

In this research, we model social influence-driven changes in users’ adop¬ 
tion behavior as a form of information transfer between users. We first obtain 
the time series representation of users’ behavior using our proposed Linear Dy¬ 
namical Topic Model (LDTM), then we quantify the information transfer us¬ 
ing Granger causality (GC) tests [28], resulting in the derived Temporal Social 
Correlation (TSC) values between two users. We say that “j follows i” or “i 
transfers information to j” when Temporal Social Correlation (TSC) exists from 
i to j at the time point of their interaction r; we denote this as TSC(i —> j, r). 
With respect to our simplified notion of social influence, we shall hereafter use 
the term “follow”, “information transfer” or “Granger cause” in place of “social 
influence”, since causality cannot be proven adequately without randomized ex¬ 
periments. It is also worth noting that TSC(i —» j, r) and TSC(j —> i,r) are 
not necessarily the same. 
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Figure 1: Example of Temporal Social Correlation from i to j in temporal 
adoption data 

As an illustration for Temporal Social Correlation in temporal item adoption 
data, consider the pedagogical example in Figure 1. The figure shows two users 
i and j adopting different subsets of five items over three time steps. When 
temporal information is missing, we could only observe the adoption states at 
the last time step (i.e., t = 3), but based on the most recent states alone we 
cannot tell whether i follows j or j follows i. Only by observing the adoption 
states of t = 1 and t = 2, we can infer that j progressively follows i in adopting 
item c at t = 2 and item d at t = 3. The converse is unlikely because ids adoption 
states remain the same over time. In other words, i’s adoption states at t = 1 
is sufficient to predict her states for t > 1. 

We can further generalize the example in Figure 1, and arrive at the following 
problem formulation: Given a set of users U and a set of items V that U 
adopt from time step 1 to T, determine the TSC(i —>• j,r) and TSC(j —$■ i,r) 
for all pair of users i,j £ U when i and j interacts at a specific time point 
t £ {1,...,T}. When TSC{i —> j,r ) > TSC\j —> i,r), we can say that i 
influences j, or j follows i. 

1.2 Measuring Causality in Adoption Data 

To quantify TSC(i —> j,r) in item adoption data, one can take a straightfor¬ 
ward approach, directly derived from the data. First, the raw frequencies of the 
adopted items for users i and j at time step t can be represented as adoption vec¬ 
tors Vi } t £ and Vj t t £ respectively, where M is the total number of items. 
Vectors for each user i over T time steps form a time series {ryi,..., Vi : r}- (An 
additional normalization step, e.g., Term Frequency and Inverse Document Fre¬ 
quency (TF-IDF), may be performed a priori on the raw frequencies to balance 
the importance of popular and unique items.) Subsequently, one can compare 
the time series {ujp,..., lyr} and {vj : i,..., Vj t x} and measure social influence 
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by computing TSC(i —> j, r) and TSC(j —>■ i, r). 

Despite the simplicity, this direct approach gives rise to several issues: 

1. The adoption vectors v^t are usually high dimensional in practice, i.e., the 
number of items M is often large. As a result, comparing vectors Vij and 
Vj } t of two users i and j would be computationally demanding (even with 
a linear-time algorithm). 

2. A related issue is the sparse nature of the adoption vectors Vi t t, since each 
user only adopts a small subset of items at a particular time t. Comparing 
two sparse vectors will hardly yield any indication of significant relation¬ 
ship between them, because we ignore the co-occurrences of different items 
adopted by the users. 

3. Since the adoption counts accumulate over time, the rate of change in v,;. t 
relative to its previous time step v^t-i will gradually decay and become 
marginally small. At this point, the time series representing the user i’ s 
behavior becomes stagnant. As TSC measures how users change their 
behavior due to other users, stagnant time series can hardly show any 
correlation effects among the interacting users. 

4. If the time series {'(,’yi,..., and (t’j.i,..., Vj t x} of users i and j are 
observed for a long period (i.e., large T), their comparison may give a 
misleading conclusion that no influence exists, because the TSC between 
the two users typically takes place within a specific time window. 

1.3 Proposal and Contributions 

To address issues 1) and 2), there is a need for a temporal latent factor model 
that can induce from sparse and high dimensional data, a compressed latent 
representation of the adoption behaviors over time. The latent representation 
should also exhibit good semantic interpretability. To handle issue 3), one may 
learn the users’ latent factors at each time step independently. However, such 
naive approach is biased towards the most recent information and subject to 
catastrophic ignorance of the past behaviors. Normally, a user does not change 
her behavior abruptly and there should be a smooth, decaying transition of 
the user’s latent factors over time. In consideration of the necessity for smooth 
transition of users’ latent factors, we develop a method to automatically estimate 
a set of decay parameters for balancing between the importance of past and 
recent information. Finally, to address issue 4), we need to specify a time 
window for constraining the comparison period in which TSC is measured. 

To fulfill these requirements, we propose in this paper a novel Linear Dynam¬ 
ical Topic Model (LDTM). The proposed model represents each user’s adoption 
behavior as topic distribution (i.e., latent factors) at different time steps, and 
the evolution of the topic distribution is captured using the concept of Linear 
Dynamical Systems (LDS) [19, 25, 46, 54]. Based on the topic distributions 
learned by LDTM, we can then conduct Granger causality tests to determine 
the social influence between pairs of users. 
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Deviating from the traditional methods that operate on the original data 
space, the proposed LDTM provides a novel inductive approach facilitating dis¬ 
covery of social influence and causality in latent topic space. To the best of our 
knowledge, LDTM is also the first kind of dynamic topic model that compre¬ 
hensively models the dynamics of the users’ adoption behaviors by leveraging 
on the LDS concept. We summarize our key contributions as follows: 

1. Utilizing LDS to model the transition of topic distributions over time, we 
can automatically compute, for each user n, a dynamics matrix A n t that 
contains the decay parameters of the user’s adoption behavior at every 
time step. Such transition modeling via dynamics matrix A n)t has not 
been proposed in any topic models. 

2. For LDS estimation on the topic distribution parameters, we develop a 
forward inference algorithm based on the idea of Kalman Filter (KF) [31]. 
The optimization of the dynamics matrix A n< t is done in such a way that 
maintains the notion of decaying adoption behavior, while ensuring that 
the temporal correlations of parameters in each topic distribution remains 
numerically stable. 

3. For inference of the decay parameters in A n t , we develop a new alternative 
method that aims at minimizing the Kullback-Leibler (KL) divergence be¬ 
tween the expected posterior distribution at time step t and the expected 
prior distribution at time step t. This approach conforms nicely to the 
notion of smooth transition in the users’ latent factors. In addition, it is 
computationally simpler and more efficient than traditional LDS methods, 
which first perform the Rauch- Tung-Striebel (RTS) smoothing algorithm 
[44] for backward inference and then an additional optimization step to 
derive the dynamics matrix [25, 46]. 

4. Based on the temporal topic distributions derived by LDTM, we are able to 
identify information transfer between pairs of users by means of Granger 
causality tests. Through extensive experiments on bibliographic data, 
including DBLP and ACMDL datasets, we find evidences for Granger 
causality among the paper co-authors. Our statistical significance tests 
also reveal that the ordering of the co-authors’ names plays a role in 
determining the information transfer among them. 

The remainder of this paper is organized as follows. In Section 2, we first 
review several works related to our research. Section 3 discusses several desider¬ 
ata in modeling temporal adoption data. The proposed LDTM is subsequently 
presented in Section 4, followed by the procedure for the Granger causality test 
in Section 5. Section 6 presents the experimental results and discussions using 
the bibliographic datasets. Finally, Section 7 concludes this paper. 
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2 Related Work 


We first introduce the classical concepts on social influence in Section 2.1, and 
present a review of the existing latent factor approaches for modeling temporal 
data in Section 2.2. We also cover in Section 2.3 some related works that use 
some of latent factor modeling and influence concepts but in a significantly 
different way from our approach. 

2.1 Social Influence 

To eliminate confounding variables for proving the existence of social influ¬ 
ence, researchers use randomized experiments that involve treatment and con¬ 
trol groups. [6] created a Facebook application to test whether broadcast or 
personalized messages have social influence on friends of a recruited user. [11] 
conducted experiments on Facebook users to study whether online political mes¬ 
sages could influence the voting decisions of users. [42] studied how the votes 
of news articles affected the articles’ discussions . 

An alternative to randomized experiments is to perform quasi-experiments. 
This approach is similar to the traditional randomized experiments, but lacks 
the element of random assignment to treatment or control. Instead, quasi- 
experimental designs typically allow us to control the assignment to the treat¬ 
ment condition, but using some criterion other than random assignment. [5] 
adapted matched sampling technique in Yahoo! Messenger data to distinguish 
between influence and homophily in the adoption of a mobile service applica¬ 
tion (Yahoo! Go) . [4] proposed the shuffle test to distinguish influence from 
homophily. 

Research on social influence has revolved around the adoption of a single 
item and satisfaction of the confounding condition. The research we pursue in 
this paper is different in several ways. First, we consider a set of items adopted 
by users instead of just a single item. Second, we propose LDTM to translate the 
high-dimensional set of items adopted by users into a low-dimensional temporal 
latent representation. 

While existing works prove the existence of social influence in the adoption 
of item for users, the social influence is expressed as a discrete value that sim¬ 
ply indicates presence or absence of influence. By contrast, we propose to use 
Granger causality measure to quantify the level of social influence between every 
pair of users, indicating how correlated their adoption behavior are over time. 

2.2 Temporal Latent Factor Models 

In general, there are two forms of temporal latent factor models. The first 
form seeks to obtain more accurate latent factors in the temporal domain by 
obtaining latent factors that globally approximate the observed data [3, 9, 33, 
34, 54, 60, 63]. The other form known as online learning focuses on the efficiency 
of handling real time streaming data by maximizing the likelihood of the latent 
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factors to fit the observed data from the most recent time window only [2, 13, 
14, 26, 30, 41, 48, 59]. 

We note that the online learning models are extensions of latent factor mod¬ 
els that are themselves not necessarily designed for dynamic data, but for effi¬ 
cient learning of new model parameters given new additional data. In this paper, 
we are concerned with modeling user behavioral data using some dynamic latent 
factor models, instead of the online learning of the dynamic models. 

[9] proposed Dynamic Topic Model (DTM) for text documents. DTM was 
extended from the Latent Dirichlct Allocation (LDA) [10] to model the evolution 
of words within topics, i.e. words prominently used in a particular topic at a 
particular time step will be replaced by a different set of words at a later time. 
However, our requirement is slightly different. Instead of the evolution of topic- 
word distributions, we focus on the evolution of document-topic distributions. 

The evolution of document-topic distributions has not been considered pre¬ 
viously in LDA-based models (e.g. DTM), because LDA is mainly used for 
modeling text documents that remain static over time. Hence, DTM does not 
consider the evolution of users’ behavior in the way we do. When we apply 
LDA for modeling users’ behavior, the users replace the role of the documents, 
while the adopted items replace the words. In our work, we assume that topic- 
item distributions remain static over time while the human users’ evolve their 
preferences over time. Since the generative process in DTM does not meet our 
temporal requirements, we are motivated to develop LDTM that extends static 
LDA by utilizing the concepts of Linear Dynamical System (LDS). 

For modeling users’ behavior, [3] used an exponential decay function to 
model the decay of users’ search intent on search engines. But they assume 
that the parameters of the decay function remain constant for all topics and 
all users. On the contrary, we assume that there is a decay parameter for each 
topic and that the decay parameters vary for each user. We aim to estimate the 
decay parameters automatically, which are representative of the users’ temporal 
behavior. 

To automatically determine the natural decay of each topic, [60] proposed a 
non-Markovian approach that models the trend of topics evolution. The key idea 
is to associate additional Beta distribution with each topic in order to generate 
the time stamps of the words sampled from the topics. But this approach 
assumes that each topic is only relevant for each specific time period, and does 
not directly model the evolution of user behavior. 

Latent factor models have also been widely used for collaborative filtering 
in recommendation tasks, and several researchers have proposed dynamic latent 
factor models for handling temporal data [33, 34, 54, 63]. However, these ap¬ 
proaches have always been focused on predicting users’ ratings on items, and 
so their models cannot be directly applied for modeling users’ items adoptions. 
Nevertheless, owing to the similarity in the fundamental concept of dynamicity 
in latent factors, we give an overview of these works here. 

[33, 34] developed TimeSVD-)—f to address temporal dynamics through a 
specific parameterization with factors drifting from a central time. Koren as¬ 
sumed that users’ item ratings remain static over time, since users do not rate 
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the same items in different time periods. However, in item adoption scenario, 
users could adopt the same items at different time periods with different fre¬ 
quency. 

[63] extended the factorization of users’ item ratings from a static R J ' /xW 
matrix to a R MxArxT tensor, where N, M, and T represent the number of 
users, items, and time steps respectively. Three sets of latent factors were 
derived from their tensor factorization method (rather than just two in matrix 
factorization case). The additional set of latent factor, known as the time latent 
factor, can be used to derive the temporal users’ and items’ latent factors from 
its multiplication. But such time latent factor assumes that the items’ latent 
factor evolves over time in the same way as the users’ latent factors. However, 
we require the items’ latent factor to remain static, while allowing only the 
users’ latent factors to change. 

[54] proposed Dynamic Matrix Factorization (DMF) which uses Linear Dy¬ 
namical Systems (LDS) [25, 46, 51, 53]. The centerpiece of this work is a dy¬ 
namic state-space model that builds upon probabilistic matrix factorization in 
[50, 49] and Kalman filter/smoothing [31, 44] in order to provide recommenda¬ 
tions in the presence of process and measurement noises. Although the LDS 
component of DMF is able to model the evolution of users’ behavior, the latent 
factors obtained by [54] are not constrained to be non-negative. Hence, their 
approach is not able to provide intuitive interpretation on the preferences of 
users’ adoption behavior. 

To summarize, all these prior works fail to satisfy the following requirements 
for inferring temporal social dependencies between users: 1) They are not explic¬ 
itly designed to model item adoption data. 2) They do not obtain non-negative 
latent factors for easy interpretation of the users’ behavior. 3) They neither as¬ 
sume that users’ behavior can decay over time nor show how the users’ behavior 
can evolve over time. 

We combine the LDA and LDS approaches to obtain LDTM. Our LDTM is 
able to model the users’ items adoption data, obtain probabilistic (non-negative) 
latent factors for characterizing user behavior over time, and automatically infer 
the optimal decay parameters for each user at different time steps. 

2.3 Topic-based Influence Measures 

To model a set of items for diffusion or inferring influence, many authors [20, 
22, 24, 39, 40, 55, 56, 57, 61, 62]. have also turned to the use of topic models. 
[20] proposed an influence matrix to suggest what items a user should share 
to maximize their individual influence in their own community. Their matrix 
measures influence between users and items while ours measure between users 
and users. Similar to [45], [62] extended PageRank [12] to include topic models 
in the computation of influence between users. 

[24] extended dynamic LDA to identify the most influential documents in 
a scientific corpus. But the dynamic LDA assumes that the documents’ latent 
factors evolve only with small perturbations while words’ latent factors evolve 
over time. [24]’s work differ from our approach, because we allow greater vari- 


ability in users’ (documents’) latent factors and assume that items’ (words’) 
latent factors remain constant. 

A notable contender to our approach is the work proposed by [57]. They 
also use topic models to reduce the dimensionality of item adoptions, followed 
by analysis using an information theoretic measure of causality known as trans¬ 
fer entropy. The algorithm to estimate transfer entropy is based on the nearest 
neighbor approach developed in Statistical Physics [35, 36, 58]. But there are 
significant drawbacks to this approach. First, it makes no assumption on the 
joint distributions of the variables, and thus requires many time steps for achiev¬ 
ing accurate estimation. It also ignores the temporal correlations between users’ 
topic distributions and the users’ behavior evolution. 

Apart from [24, 57], all the prior works which uses latent factors does not 
use the time information when inferring influence. In this aspect, our work goes 
beyond the norms by considering temporal users’ items adoption and proposing 
several temporal models. We distinguish our work by devising a Linear Dynam¬ 
ical System (LDS) approach to linearly correlate the users’ topic distributions, 
and using Granger causality that likewise assumes linear relationship among 
variables. Due to this linear assumption in Granger causality measure, our 
method also requires less number of time steps to derive an accurate measure 
of social influence between (pairs of) users. 

3 Desiderata in Modeling Temporal Adoption 
Data 

Measuring TSC between two users i and j requires two crucial steps. First, 
an accurate measure of the users’ adoption behavior represented as time series 
vectors in latent space is required for every time step. That is, we require latent 
factors 9i. t ,9j.t £ R A for each user pair (i,j) at time step t. 9i yt and 9j,t has K 
dimensions where K is much smaller than M (the total number of items). Sec¬ 
ond, a temporal correlation measure is needed to compare between the trends 
of two time series. Knowing how two time series temporally correlate should 
help us make better predictions or reduce our uncertainty for their future adop¬ 
tion behavior. However, we need to address some issues in modeling temporal 
adoption data, as elaborated in Sections 3.1 and 3.2. 

3.1 Latent Representation of Temporal Adoption Data 

We propose a new way of representing user’s adoption behavior in temporal 
latent space as opposed to the traditional method of using only the frequency of 
adoption in high dimensional space. There are some advantages of representing 
adoption behavior in temporal latent space as well as some difficulties, which 
we will elaborate further. 

For illustration, consider the temporal item adoption problem in Figure 2, 
involving three users {ui,i 12 , 113 } and eight items {w±,... ,w$} over two time 
steps. If we model the topic distributions at each time step independently of 
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(a) t = 1 



(b) t = 2 


Figure 2: Topic Modeling in Temporal User Item Adoptions 


other time steps, we would obtain the scenarios in Figure 2(a) for time step 1, 
and Figure 2(b) for time step 2. One may see that the edges between users 
and items are sparse, which does not allow us to draw any meaningful intu¬ 
itions about the relationship of items and does not show us any common item 
adoptions among the users. 

However, when we combine the temporal adoptions into a single time step, 
we obtain the scenarios as illustrated in Figure 3(a). Figure 3(a) shows the 
result of performing topic modeling on data without temporal considerations. 
The items adopted by users U \, «2 and «3 are clustered according to topics 1 and 
2 based on the density of edges between users and items. We therefore require 
a method of modeling the temporal adoptions such that it allows us to preserve 
the edge densities across time steps and provides us with the topic distributions 
at different time steps. Such model could combine the temporal adoptions and 
construct dependencies between different time steps by having the scenario as 
shown in Figure 3(b). 

3.2 The Need for Temporal Probabilistic Topic Model 

There are many ways of modeling users’ adoption behavior in latent spaces, 
and we wish to justify our choice of using probabilistic topic model. Besides 
probabilistic method, one may use Non-negative Matrix Factorizations (NMF) 
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Figure 3: Topic Modeling in Static User Item Adoptions 


to obtain low-rank matrices that can substitute for the users’ and items’ latent 
factors [64, 38]. Our previous work in temporal item adoptions has also explored 
the use of LDS with NMF [19] for modeling evolving users’ preferences. LDS 
with NMF can be stated as follows, 

%n,t = -^n,t—l * %n,t— 1 T C, 6 ^ J\f ( 0, Q) 

^Xn,t,m — • X n j 

where x n> t £ R K is the vector representing user n’s adoption behavior at time 
step t , A U 't -i £ R^ xif is the dynamics matrix which evolves user’s behavior 
from time t — 1 to f, w n j,m £ R is the number of times user n adopts item m 
at time t, and C m £ R A represents item m’s latent factor. 

In [19], we estimated the items latent factor matrix C £ R Mxif for NMF 
by minimizing the sum-of-squared errors via stochastic gradient descent (SGD) 
with non-negativity constraints. The model was subsequently solved as an in¬ 
stance of Expectation Maximization (EM) algorithm [8, 21], where the E-step 
carries out Kalman Filtering and RTS Smoothing, and the M-step serves to 
optimize the dynamics matrix A n t . 

In order to obtain interpretable topics, it is imperative that the items latent 
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factor matrix contains only non-negative values [19]. We often rank the impor¬ 
tance of items according to the items’ value in the respective latent factor, but 
this is not true if the latent factors contain negative values. A negative c m ,fc 
can also be important for contributing to the value w n t,m if the corresponding 
x n ,t,k is also negative. 

Due to the different amounts of item adoptions for each user at different 
time steps, a single static matrix C that is defined in real space M MxA does not 
fit well for the adoption patterns of every user. C was also only estimated once 
before running EM algorithm to estimate the rest of the parameters. 

There is thus a strong requirement to have a non-negative items’ latent factor 
matrix that is normalized across different time steps which is estimated by an 
algorithm that updates the items’ latent factor iteratively while learning the 
other parameters. Probabilistic approaches give us normalized parameters that 
sum to one and are non-negative (since probabilities cannot be less than zero). 
By alternating Gibbs Sampling with Kalman Filter and additional optimizing 
steps to derive the dynamics matrix, we derive an algorithm summarized in 
Algorithm 1 to estimate all the necessary parameters that achieves an overall 
better fit to the observed data. 


4 Linear Dynamical Topic Model 

Figure 4 shows the probabilistic graphical representation (a.k.a. Bayesian net¬ 
work) of LDTM using plate diagram. In essence, LDTM is a combination of 
Latent Dirichlet Allocation (LDA) and Linear Dynamical System (LDS). We ob¬ 
tain the users-topic distributions at each time step by inferring the latent topic 
variable conditioned on the words written in each time step and the topic-item 
distributions. 

4.1 Modeling Assumptions 

We assume that the topic item distribution remains static over time, while the 
users’ topic distribution evolves over time through a linear dynamical process 
conditioned on the previous time steps and the inferred latent variables in cur¬ 
rent time step. We further elaborate our assumptions of LDTM as follows: 

1. Given that there are K topics and temporal adoption data, the topic dis¬ 
tribution d„ jt of user n at time step t is defined by the Dirichlet distribution 
with parameters x n j £ R A . 

Q n ,t ~ Dir(x n>t ) 

2. To relate the current parameters x n t with the previous time step param¬ 
eters x U: t- i, we assume a linear distribution as defined by, 

Xn,t = -^n,t—l ‘ X n t—l 
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Figure 4: Probabilistic graphical representation of the proposed LDTM 

where A n t € represents the dynamics matrix of user n at t. This 

step distinguishes our model from all other topic models, i.e., we model the 
evolution of users’ topic distribution using a dynamics matrix. We also 
derive a whole new set of inference equations for estimating the model 
parameters in Section 4.3. 

3. The topic z n ^, m of an item m adopted by user n at time t is given by, 

Zn,t,m ~ Mult(6 n j) 

Each topic item distribution is given by a simple symmetric Dirichlet 
distribution, 

(f>k ~ Dir(/3) 

Then each item m adopted by user n at time t conditioned on topic variable 
z n ,t,m is given by, 

= ^)] ^ Mult[(f>k) 

4.2 Estimating Topic Distribution Parameters 

To calculate TSC , we require the topic distributions for each user n at each time 
step t conditioned on the information up to t as denoted by 0 nyt \t, also known as 
the posterior topic distribution. Since we have defined 9 n j as a Dirichlet distri¬ 
bution with parameters x n ,t, knowing x n , t \t is sufficient for deriving 0 n ^\ t . 9 n t | t , 
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the posterior topic distribution of user n at time t conditioned on information 
up to time step t is given by, 


@n,t\t ^ Dir(x n ,t\t) 

x n t \ t , the posterior parameters of the Dirichlet distribution for user n at time 
t conditioned on information up to time step t is given by a slight modification 
of the Kalman Filter [31] algorithm, 

n,t\t %n,t\t—l ~b ^n,t 

where x/j n ,t G M A and i/) n ,t,k denote the number of times user n at time t gen¬ 
erated topic k. x n< t\t-i is the prior parameters of the Dirichlet distribution for 
user n at time t conditioned on information up to time step t — 1, 

%n,t\t — l A n ,t— i * X n ,t—l\t—l 

where A n , t -i G K Ax ' A is the dynamics matrix that evolves the parameters from 
t — 1 to t. If A n , t for all time steps t is assumed to be an identity matrix, the 
model reduces to the traditional LDA model for temporal data sets. 

In previous works that use LDA on static data, there is a lack of emphasis 
on the importance of posterior and prior distributions. In temporal data, it is 
more important to distinguish between the two, as the posterior parameters of 
time step t — 1 becomes the prior parameters at time step t after factoring in 
the dynamics matrix A nt _ i. 

4.3 Estimating the Decay Parameters for Dynamics Ma¬ 
trix 

Since the dynamics matrix A n ^~i gives us the prior distribution 8 nt \ t _i, an ideal 
dynamics matrix should be able to predict the posterior distribution 8 nt u well. 
Therefore, to find the optimal dynamics matrix would require us to minimize 
the divergence between the expected prior and expected posterior distribution. 
A simple divergence to use would be the Kullback-Leibler (KL) Divergence. We 
minimize the KL Divergence between the expected posterior topic distribution 
9 n t \ t and expected prior topic distribution 8 n t | t _i of user n at time t as follows, 

T 

minimize E Dkl [E (0„, t | t ) ||E(0„, t | t -i)] 

t —2 

The KL divergence is defined in terms of topic counts and Dirichlet parameters: 


„ /„ \ A n ,t-1 ■ X ni t-l\t-l + 1pn,t + OL f , An,t-l ' 

■ft \ Vn,t\t) — „ . / . -;-^ft (fWilt-l) — 771- 

T [An,t— 1 ■ Xn,t—l\t—l + 1pn.,t) + Ka 1 An,t — 1 ' X n ,t — l\t — : 

K 

Dkl [E (8 n ,t\t) \\E (0 n ,t|t-i)] = ^E (O n ,t,k\t) [log E (8 n , t ,k\t) - logft (0 n ,t,fc|t-i)] 


i + a 

i + Ka 
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We then find the dynamics matrix A ntt ~i that minimizes the objective func¬ 
tion C given by Equation 1, 

T K 

£ = EE E (d n , t ,k\t) [log E (O n j,k\ t ) - log£;(0 n , t>fc | t _i)] (1) 

t=2 fc=l 

Assume that S is a diagonal matrix with entries p n ,t-i,k- 

Taking this into consideration, we can now try to minimize C by performing 
gradient descent with respect to the parameters p n ,t-i,k- The gradient term is 
given in Equation 2, 


dC _ dE (e njtMt ) 




log 


E ( @n,t,k\t ) 

E (@n,t,k\t — l) 


+ E (O n ,t,k |i) 


dlogE (0 n>t> fc|t) d\og E (0 nt t,k\t-l) 


dfJ>ri,t — l,k 


d/Jin,t — l,k 

( 2 ) 


where the individual components can be respectively solved as, 

dE {p n t k\t) _ x n,t-l,k\t-l (1 Vn,t + A'a) — (0ra,t,fc + a) 


df^n,t—l,k 

dlogE __ 

df^n,t—l,k E (9 n> t,k\t) l,fc 

dlogE {d nt k\t_i) x n j—i,k\t —i 


[l / (-An,t—1 ■ x n,t- l|t-l + 0n,t) + A'a] 

1 dE {O n t ^k\t) 

1 X Tl,t— l|i— 1 


(3) 

(4) 


d^n,t—l,k 


Pn,t— i,k ' x n,t —i,fc|t—l T cr 1 T AW 

(5) 


According to Siddiqi et al. [52], an LDS is Lyapunov (a.k.a. numerically) 
stable if the eigenvalues of the dynamics matrix A„ jt is less than or equal to 
one. The eigenvalues of any general matrix are guaranteed to be less than or 
equals to one if the sum of each row in the matrix is less than or equals to 
one. To ensure stability of LDTM, we enforce /-i n ,t : k to stay within [0,1]. By 
staying within the [0,1] range, the n n ,t,k is also able to represent decay of the 
parameters learned in previous time steps. 


4.4 Outline of Parameter Estimation 


Algorithm 1 summarizes the procedure to estimate all parameters of the LDTM 
model depicted in Figure 4. It begins by randomly initializing the latent vari¬ 
ables followed by Gibbs sampling iterations which consist of several steps. 

First, the prior topic distributions are estimated using Kalman Filter. Then 
the latent variable z nt t,m is sampled by conditioning on the prior parameters 
x n,t\t—ii sampled variables i /> n> t from previous iterations and the constant pa¬ 
rameters a, f3. 


p(Zn,t,m — k\%n,t\t — 1) 071,t; £k ; [3 ) 


{ x n,t,k\t-l + 0n,t,fe + <a) 


T (3 

T£fc + KP 


( 6 ) 
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Using the sampled latent variables, we derive the posterior parameters x n t \ t 
via Kalman Filter. Finally, we estimate the dynamics matrix A n j via gradient 
descent of equation (2) for minimizing the KL between the prior and posterior 
distributions. We repeat these steps until a maximum number of iterations is 
reached. 


5 Computing Temporal Social Correlation in Topic 
Space using Granger Causality 

After obtaining the posterior topic distributions 9 n t \ t ,Vn £ U , we can construct 
time series of the distributions and calculate the Temporal Social Correlation 
(TSC) using Granger causality (GC) [27]. For a pair of users (i, j), TSC can 
be measured in two directions, TSC(i —>• j, t) and TSC(j —» i, r), pivoted at a 
specific time step r. One should appropriately choose r to indicate the starting 
point for information transfer between i and j. Given r, we could then select a 
time window [t — W,t + L] to constrain time series used for comparison, where 
L is the number of time steps to “lookahead” for measuring TSC and W is the 
“width” of past time steps for predicting the future. 

For notational simplicity, we denote the topic distributions for users i and j 
at t as it and jt respectively. Specifically, given two users i and j who interact 
at time r, TSC(i —> j,r) is computed as follows: 

1. Formulate the two linear regression tasks: 

jt =Vo+ 

r i = Y1 (j* - a)' C it - jt) ( 7 ) 

t—T 

( w \ 
jt = Vo H - ( ^ ^ Vwjt—w H - ^w^t—w j 

t+L 

R 2 = J2 (ji - Jt)' tit - Jt) (8) 

t—T 

where r is the time point when i and j begins transferring information 
between one another. 

2. Estimate for the parameters {770,..., rjw} by minimizing the least squares 
error in (7) using coordinate descent [7], and then estimate only for the 
parameters {Ai,...,Aw} by minimizing (8). The first linear regression 
given by (7) uses j’ s past information to predict j’s future, while the 
second linear regression (8) uses additional information from i’s past to 
predict y’s future. 
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3. To obtain the TSC(i —> j. r), we measure how much i’s past improves the 
prediction of j’s future by computing the F-statistic ( F-stat ), 

rrcr^c , ■ \ v + + R 1 —R 2 21/- 1 

TSC{i ->j,t) = F-stat = ——-——- 

Because the formula (8) uses more parameters than (7), the sunr-of-squares 
error given by R 2 is always smaller than i?i, i.e. R 2 < Ri, which implies 
that F-stat is always positive. 

4. Repeat the steps for computing TSC(j —> i,r) and compare whether 
TSC(i —» j,T ) > TSC(j — > i,T) or otherwise. 

6 Experiments 

To evaluate the effectiveness of LDTM and the TSC calculated for pairs of 
users, we require datasets that provide users’ temporal adoptions and the in¬ 
teractions between users that lead to information transfer between them. The 
publicly available DBLP [37] and ACM Digital Library (ACMDL) [1] biblio¬ 
graphic datasets provide the information we require. We first describe how we 
obtain subsets of the data from DBLP and ACMDL for our evaluation needs. 
Then we evaluate the effectiveness of LDTM for several scenarios of the dynam¬ 
ics matrix A„ jt : 

1. LDA: To reduce LDTM to the baseline LDA, we simply set A n t as identity 
matrix for every user n and every time step t, i.e. A„ jt = I. 

2. Half Decay: We set A n j as diagonal matrix with constant values of 0.5, 
i.e. A n<t = 0.5 ■ I. 

3. Full Decay: We set A„ it as zero matrix, i.e. A rlit = 0. 

4. LDTM: We automatically determine the values of the dynamics matrix 

A n<t ■ 

We show that automatically estimating A n> t in the LDTM case gives us 
better representations of authors’ temporal adoption behavior than setting con¬ 
stant values for A n t . Using a case study as example, we show how the topic 
distributions over time for an author and his co-authors can be used to deter¬ 
mine the information transfer relationship between them. Finally, we compare 
and compute the TSC between every pair of authors using topic distributions 
from the four scenarios of the dynamics matrix A„ it (i.e., LDA, Half Decay, Full 
Decay and LDTM). 

6.1 Data Set 

We used the DBLP and ACMDL data to obtain the required users and items. 
The authors who wrote papers together are treated as users, and the words in 
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their papers are seen as adopted items. We used the words in the abstract for 
ACMDL, and those in the paper title for DBLP. The co-authorship information 
provides a time point where interaction occurred between the two authors. 

Given the large number of publications in DBLP and ACMDL, we only used 
a subset of papers from DBLP and ACMDL. We sampled a data subset that 
covers a wide variety of fields in computer science, with the papers published 
in the Journal of ACM (JACM) as a seed set. We then expanded the coverage 
by including other non-JACM publications by authors with at least one JACM 
publication. The sample set obtained here is termed ego-1. By including the 
co-authors of the authors in ego-1 and their papers, we get a larger set called 
ego-2. We repeat the process once again to get ego-3. 


Table 1: Data Set Sizes 



^authors 

^twords 

period 

ACMDL (ego-2) 

24,569 

33,044 

1952-2011 

ACMDL (ego-3) 

157,715 

44,308 

1952-2011 

DBLP (ego-2) 

52,754 

20,080 

1936-2013 

DBLP (ego-3) 

388,092 

40,463 

1936-2013 


Table 1 gives the sizes of the ego-2 and ego-3 datasets. DBLP has more 
authors than ACMDL, because DBLP covers a longer history of publications 
and has more sources of publications. On the other hand, ACMDL focuses 
mainly on ACM-related publications. After pruning the stop-words and non- 
frequent (less than ten occurrences) words, the ACMDL sampled dataset have 
slightly more words than DBLP, as ACMDL provides words in the abstract 
of publications while DBLP only has words in the paper titles. We used the 
smaller ego-2 samples for experiments that require repetitions, and the much 
larger ego-3 samples for experiments that only require a single run. Because 
JACM lists a total of 26 major fields in Computer Science, we used 26 as the 
number of topics for training our models in all the subsequent experiments. 

6.2 Convergence of Log Likelihood 

We first evaluate the convergence of the log likelihood for the case where the 
dynamics matrix A Uyt is automatically computed, and for other cases where A Uyt 
is set to constant values. We used the ego-2 samples for evaluating log likelihood 
convergence, because ego-2 will be used later for the predictive evaluations. 

Figures 5(a) and 5(b) show how the log likelihood varies with the number 
of iterations for ACMDL (ego-2) and DBLP (ego-2). We can see that LDTM 
achieves the highest likelihood in DBLP and is able to converge with log likeli¬ 
hood comparable to that of Half Decay and Full Decay. 

Figures 5(a) and 5(b) also reveal another interesting observation: the Full 
Decay model is able to perform well in ACMDL, but not as well in DBLP. 
This can be explained as follows, since ACMDL provide words from papers’ 
abstract, the information within each time step is sufficient for estimating accu- 
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Iterations Iterations 

(a) ACMDL (b) DBLP 


Figure 5: Log Likelihood vs # of Iterations 


rate parameters. But in DBLP, only the words in the paper titles are available, 
providing less information for parameter estimation. As a result, decaying the 
parameters of previous time steps does not allow Full Decay to leverage on the 
previously observed data, which explains the poor likelihood in DBLP. The same 
explanation applies to LDA, since LDA shows an opposite performance to Full 
Decay. 

Based on the ACMDL and DBLP results in Figure 5, we can see that fixing 
the decay parameters of the dynamics matrix does not give consistent perfor¬ 
mance in comparison to that of LDTM. This shows that the automatic estima¬ 
tion of dynamics matrix in LDTM can better model the different properties of 
the available data. 


6.3 Comparison Results on Held-out Test Set 

We compared the automatic estimation of dynamics matrix A n j for LDTM 
against the fixed values of A n t (i.e., Full Decay, Half Decay, LDA) in a pre¬ 
diction task. We repeated the prediction experiments for five runs and took 
the average results. For each run, we generated five sets of training and testing 
data by hiding in incremental proportions of 10% from the sampled ACMDL 
(ego-2) and DBLP (ego-2) datasets. When creating the test sets, we ensure 
that each subsequent test set is a superset of the previous test set. We trained 
LDTM and other baseline models on the remaining data sets for 50 iterations, 
and evaluated their predictive performances on the hcld-out test sets. The pre¬ 
dictive performance on the held-out test sets is measured in terms of average 
log likelihood for each time step t ( ALL@t ), defined as, 


ALL@t = 


E N 

n Z-^m 


log p{l 


’n,t,ra 


En M n , t 


I OnA) 


Essentially, the ALL@t gives us a measure of how well the estimated parame¬ 
ters can predict the test sets. Normalization of the log likelihood (over the total 
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number of words) in each time step is necessary to avoid over-deflating the log 
likelihood at different time steps. For example, larger t would have more words 
and hence smaller (more negative) log likelihood as compared to smaller t. The 
log likelihood normalization at each t would thus allow for a better compari¬ 
son across different time steps. A higher ALL@t suggests a better predictive 
performance for the respective model. 




Figure 6: ACMDL: Log Likelihood over Time for Held-out Test Set 

Figures 6(a)-6(d) and 7(a)-7(d) show the ALL@t for every t. All these figures 
show that LDTM outperforms all other baseline models in terms of ALL@t, 
while Full Decay performs the worst. Although Full Decay gives highest log 
likelihood performance in the convergence results as shown in Figure 5(a), the 
result in Figure 6 does not show the same correlations, which may be attributed 
to the fact that Full Decay overfits the data. Meanwhile, Half Decay shows 
that it fits the data relatively well (cf. Figure 5), but it does not perform as 
well as LDTM (cf. Figure 6) and it performs worse than LDA (cf. Figure 7). 
While LDA performs better than Half Decay and Full Decay in Figure 7, it does 
not outperform Half Decay in Figure 6. By contrast, LDTM is able to achieve 
consistently good performance in both datasets, winning over Full Decay, Half 
Decay and LDA. This suggests that the dynamics matrix estimated using the 
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Figure 7: DBLP: Log Likelihood 




(d) Test size=50% 


over Time for Held-out Test Set 


algorithm in Section 4.3 can capture the dynamic patterns of each user more 
accurately. 

6.4 Case Study 

This section provides a case study to illustrate Granger causality among au¬ 
thors. We first show how an author’s topic interests change over the years. 
Subsequently, we describe how the changes in his co-authors’ topic interests 
explain the changes in his own topic interests. 

In this study, we focus on the profile of Professor Duminda Wijesekera (D. 
Wijesekera), so as to remain consistent with the case study in our earlier work 
[17]. For this case study, we performed our analysis on the DBLP (ego-3) 
dataset, due to its wider coverage of authors and years. Based on our earlier 
results of various temporal topic models on the DBLP dataset, we choose to 
analyze the Granger causality based on the topic distributions computed by 
LDTM and LDA, as the two yielded the best performance on DBLP. By com¬ 
paring LDTM and LDA, we also show the importance of using the correct latent 
factor model for Granger causality analysis. 
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6.4.1 Granger Causality using LDTM 



Figure 8: LDTM Results: Duminda Wijesekera’s and His Co-Authors’ Topic 
Interests from 1990 to 2006 


Table 2: Topics Derived from LDTM 


Security 

Data Mining 

Logic &; Computation 

Distributed Computing 

security 

secure 

scheme 

efficient 

privacy 

data 

mining 

fuzzy 

databases 

query 

logic 

verification 

programming 

reasoning 

languages 

service 

management 

grid 

computing 

framework 


Figure 8(a) shows the LDTM-induced topic distribution of D. Wijesekera 
from year 1990 to 2006 (top four topics), with the corresponding topic words 
shown in Table 2. During this period, the “Security” topic has the largest 
area under the curve. For illustration purposes, we show the “Security” topic 
proportion of D. Wijesekera’s co-authors for the same time period. 

Due to space limitation, we only show five co-authors who collaborated with 
D. Wijesekera frequently in Figure 8(b). Among them, Sushil Jajodia’s and 
Jaideep Srivastava’s names are placed after D. Wijesekera in the papers they 
wrote, while Csilla Farkas’, Lingyu Wang’s and Naren Kodak’s names appear 
before D. Wijesekera. From Figure 8(b), we can also see clearly that D. Wije¬ 
sekera’s -o- line follows the trend of Sushil Jajodia’s -*- line. 

6.4.2 Granger causality using LDA 

Figure 9(a) shows the LDA-induced topic distribution of D. Wijesekera from 
year 1990 to 2006 (also the top four topics), and Table 3 shows the corresponding 
topic words. Instead of the “Security” topic, the “Computation & Logic” topic 
occupies the largest area here. 
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We then analyzed the topic proportions of the “Computation & Logic” topic 
for D. Wijesekera’s co-authors, as given in Figure 9(b). Unlike the previous 
LDTM case, we could not find any co-authors who have significant correlation 
with D. Wijesekera in the “Computation & Logic” topic. This indicates that 
the accuracy of the topic distributions are important for us to infer the Granger 
causality between the authors. 

The LDTM model is able to appropriately decay the importance of other 
topics and focus on emergence of new topics such as “Security” in D. Wijesek¬ 
era’s case. Granger causality would then allow us to find co-authors who are 
socially correlated to D. Wijesekera in order to explain the emergence or change 
of academic interests over time. 



(a) Top Four Topics (b) Co-Authors’ Computation & Logic Topic 

Figure 9: LDA Results: Duminda Wijesekera’s and His Co-Authors’ Topic 
Interests from 1990 to 2006 


Table 3: Topics Derived from LDA 


Security 

Database Systems 

Logic & Computation 

Distributed Computing 

security 

secure 

scheme 

efficient 

privacy 

video 

peer 

multimedia 

adaptive 

content 

logic 

programming 

verification 

languages 

formal 

service 

management 

grid 

computing 

mobile 


6.5 Knowledge Discovery using Temporal Social Correla¬ 
tion 

We now show the application of topic distributions as time series for computing 
the TSC between two authors i and j at a time point r. The parameters 
“width” and “lookahead” are both set as 4. Using the co-authorship information 
in our sampled ego-3 datasets, we choose some pairs of authors and the year 
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of publication at time point r in order to compute TSC. We formulate the 
following three hypotheses: 

1. AB: If j is the first author and i is the second author of a publication 
written at r, then i transfers information to j, i.e. TSC(i —> j,T ) > 
TSC(j^i,r). 

2. AZ: If j is the first author and i is the last author of a publication written 
at r, then i transfers information to j, i.e. TSC{i —>• j, r) > TSC(j —>■ 
i,r). 

3. Bf_Af: If j and i are authors of a publication written at r with more 
than two authors and j comes before i, then i transfers information to j, 
i.e. TSC(i —> j,r) > TSC(j —>• i,r). 

and compute a Ratio metric for each scenario, defined as: 


Ratio = 


£ (iJ )eP I(TSC(i -)• j, t) > TSC(j -»■ i, r)) 

1^1 


where /(.) is the indicator function, and P is the set of all user pairs (i,j) 
considered in the evaluation. In other words, the Ratio may be interpreted as 
the proportion (or probability) of the TSC from user i to j being greater than 
that from j to i. 

To prevent confounding factors in our experiments, we excluded the pub¬ 
lications that have the authors’ last name arranged in ascending order. We 
removed 41% from ACMDL and 44% from DBLP, although some papers might 
have authors last name in ascending order due to coincidence. 



(a) ACMDL: Ratio (b) DBLP: Ratio 


Figure 10: AB: Ratio based on Time Series of Various Models 

For every i,j author pair who co-authored at least once, we computed the 
Ratio for each time step r. We then analyzed the Ratio values for every pair 
with respect to the number of time steps in which they had sustained the co¬ 
author relationship. Figures 10 to 12 show the Ratio values versus the number 
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Figure 11: AZ: Ratio based on Time Series of Various Models 



(a) ACMDL: Ratio (b) DBLP: Ratio 


Figure 12: BLAf: Ratio based on Time Series of Various Models 


of time steps the author pairs have sustained their co-authorships. The subplots 
(a) of Figures 10 to 12 present the Ratio values for ACMDL, while subplots (b) 
of Figures 10 to 12 give the Ratio values for DBLP. Only bins with more than 
90 data points are shown in the Figures. 

Figures 10 to 12 reveal several interesting phenomena. To describe the phe¬ 
nomena, we use the notation I to denote the set of authors whose names are 
placed at the back, and J to denote the set of authors whose names are placed 
in front. In general, since we have earlier filtered out the papers with alphabet¬ 
ical ordering, researchers (J) who do the bulk of the research have their names 
placed in front of their co-authors (I). 

1. All models and hypotheses show that the ratio is always below 0.5. This 
indicates I does not necessarily influence J. On the contrary, the results 
show a high probability of J influencing /, i.e. Researchers who do the 
most work influence their co-authors. 

2. The ratio is always on a downtrend which indicates that the influence I 
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has on J decreases over time. As researchers progress in their research 
career, the influence their co-authors have on them decreases over time. 

3. By comparing between the Ratio of Figures 10 and 12, the second author 
i has more influence on the first author j. as compared to the generic case 
where j can be any author (e.g. 2nd, 3rd, etc.) and i comes after j. 

4. By comparing between the Ratio of Figures 10 and 11, the last author i 
influence the first author j, more than the second author. 

Based on these results, we can conclude that, for pairs of authors who wrote 
a paper together, it is highly likely that the author whose name appears in front 
(Granger) causes those whose names appear at the back to change their topic 
distributions over time. We therefore reject the earlier hypotheses of AB, AZ 
and Bf_Af. 


7 Conclusion 

This paper presents a means for identifying Temporal Social Correlation (TSC) 
based on latent topic representation of item adoptions that users perform over 
time. We propose a Linear Dynamical Topic Model (LDTM) that synergizes 
the merits of probabilistic topic models and Linear Dynamical Systems (LDS) 
in order to capture users’ adoption behavior over time. The EM algorithm for 
solving the model draws upon Gibbs Sampling and Kalman Filter for inference 
in the E-Step, followed by the M-Step which minimizes the Kullback-Leibler 
(KL) divergence between the prior and posterior distributions for estimating the 
dynamics matrix. By taking into account both the stability and non-negativity 
constraints, we derive a dynamics matrix that represents how users decay their 
adoption behaviors and preferences over time. 

By using the users’ topic distributions at different time steps, we construct 
each user’s time series and compare it with their co-authors’ using Granger- 
causal tests. Our experiments on bibliographic datasets demonstrate that, by 
employing Granger causality on the time series, we can calculate the TSC be¬ 
tween authors of the paper and discover that the ordering of authors’ names 
plays a role in how information transfer among them. 

Ultimately, all the measurements we made is to further the science of pre¬ 
dicting the future. However, it remains to be seen whether quantifying social 
influence through TSC could be used for making recommendations to users on 
what items to adopt, and that could be a future direction in our work. 
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Algorithm 1 LDTM Inference 
1: Input: Adoption data for each user n at each time step t 
2 : Output: Estimated parameters 

3: Define: ipn,t,k is the number of times user n at time t generates topic k. 

4: Define: £,k,m is the number of times topic k generates item m. 

5: // Initialization 

6: for n •<— 1 to N do 

7: for t <— 1 to T n do 

8: for m 1 to M n i do 

9: k <r- uniformRandom(l , K ) 

10: 'lpn,t,k t 1pn,t,k 1; 

11: £fc ,m 1? 

12 : ^ k', 

13: end for 

14: end for 

15: end for 

16: // Gibbs sampling iterations 
17: repeat 

18: for n <— 1 to N do 

19: // Estimating topic distribution parameters: Kalman filter 

20: for t, •<— 1 to T n do 

21: X n t \ t -i <r- A n t _i • X n t _i\ t _i\ 

22: for m <r- 1 to M n j do 

23: k i 

24: '4*n,t,k f 'lpn,t,k lj 

25: £,k,m f £k,m 1; 

26: k <r- sample(m, x n t\t-i + ipn t, £fc, cc, /?); // Sample using Equation 

( 6 ). 

27: Ip n ,t,k fp n,t,k + 1; 

28: 

,m ^ £,k,m H - 1? 

29: Z nt t,m f - k] 

30: end for 

81: 1 "f~ 

32: end for 

33: // Estimating decay parameters: KL divergence minimization 

34: for f •<— 2 to T n do 

35: for k <— 1 to K do 

36: Update the diagonal entries p. n ,t-i,fc £ using Equation (2) 

37: end for 

38: end for 

39: end for 

40: until maximum iterations 
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